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Abstract 


This paper connects nonpositive sectional curvature of a Riemannian 
manifold with the displacement convexity of the variance functional on 
the space P(M) of probability measures over M. We show that M has 
nonpositive sectional curvature and has trivial topology (i.e, is homeo- 
morphic to R n ) if and only if the variance functional on P(M) is displace¬ 
ment convex. This is followed by a Jensen type inequality for the variance 
functional with respect to Wasserstein barycenters, as well as by a result 
comparing the variance of the Wasserstein and linear barycenters of a 
probability measure on P(M) (that is, an element of P(P(M))). These 
results are applied to invariant measures under isometry group actions, 
giving a comparison for the variance functional between the Wasserstein 
projection and the L 2 projection to the set of invariant measures. 

1 Introduction 

In this paper, we study the influence of nonpositive sectional curvature of a com¬ 
plete Riemannian manifold M on the geometry of the space P{M) of probability 
measures, equipped with the Wasserstein metric. 

Given a probability measure fion M, the variance of /i is defined by 



where d denotes the Riemannian distance. A minimizer y £ M in the above is 
often called a barycenter of fi. We are interested in the way that the variance, 
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viewed as a functional on the space P(M) of Borel probability measures on M, 
interacts with the geometry on P(M) induced by the Wasserstein distance; the 
Wasserstein distance between fi, v £ M is given by 



( 1 . 1 ) 


where, for i = 1 , 2 , 71^7 denotes the pushforward of 7 by the canonical pro¬ 
jections, 7 i l (x,y) = x , 7 r 2 (x,y) = y, respectively. Recall that in general, 
the pushforward T#a of a measure a by a map T : X —> Y, is defined by 
T#cr(A) := cr(T” 1 ( J 4)) for all measurable sets A C Y. 

We will show that the combination of nonpositive sectional curvature to¬ 
gether with trivial topology, is characterized by displacement convexity of the 
variance; that is, convexity along geodesics on P(M) induced by the Wasser¬ 
stein metric (see Theorem 12 .1 1 below!. The notion of displacement interpolation, 
initiated by McCann m, gives a natural geometric way to interpolate between 
two probability measures. In turn, convexity of certain functionals with respect 
to this interpolation, known as displacement convexity, has proven to be a re¬ 
markably powerful tool in proving geometric and functional inequalities, and 
has found applications in physics and economics as well; see, e.g. ! 1911211 . 

Let us note that there are already many known characterizations of non¬ 
positive sectional curvature; in fact there is one involving the variance func¬ 
tional, due to Sturm m Theorem 4.9], which applies to more general spaces 
than we consider here. We believe, however, that it is interesting to have a 
characterization involving displacement convexity, particularly in light of the 
now well known characterization of Ricci curvature bounds involving displace¬ 
ment convexity of the entropy functional, developed by many authors, includ¬ 
ing Cordero-Erausquin-McCann-Schmuckenschlager [4], Otto-Villani m and 
Sturm-Von-Renesse [21] , and culminating in the recent work of Lott-Villani [9] 
and Sturm mm 

Note that, unlike many other interesting displacement convex functionals, 
the variance functional is well defined and finite as soon as the measure has 
finite second moment (ie, one does not require absolute continuity with respect 
to volume), and is weak-* continuous. This property makes it particularly 
well suited for studying sectional curvature bounds. Heuristically, displacement 
interpolation moves a measures along a family of non-intersecting geodesics 
with fixed endpoints. Nonnegative Ricci curvature tends to pull those geodesics 
apart at intermediate times; this is quantified by the displacement convexity 
of the entropy in the works cited above. Our setting is slightly different; we 
expect nonpositive sectional curvature to contract geodesics at intermediate 
times in a certain sense. However, as sectional curvature is a property of two 
dimensional sections of the tangent space, this contraction may not be detectable 
by functionals which are finite only on absolutely continuous measures. For 
instance, if the sectional curvature of some section is positive, but the Ricci 
curvature is everywhere negative, the volume of a small ball will get contracted. 
However, a set which is concentrated and interpolated along the directions with 
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positive sectional curvature can get spread out in a certain sense; the variance 
turns out to be an appropriate way to quantify this. 

We go on to extend the convexity of the variance to convexity with respect 
to Wasserstein barycenters: see Theorem 13.61 Analagously to the definition 
of barycenters of measures on M, a barycenter BC W (f2) of a measure fl on 
P(M), which we call a Wasserstein barycenter, or simply, W 2 -barycenter, of O 
is defined as a minimizer of 




( 1 . 2 ) 


P(M) 


The notion of Wasserstein barycenters was considered by Agueh-Carlier |T[ when 
M is a subset in the Euclidean space M C R" and is a discrete measure 
on P{M), and later by the present authors [8] for Riemannian manifolds M 
and general probability measures fl on P(M). It extends displacement inter¬ 
polation, allowing one to interpolate between several (or, in our formulation, 
even infinitely many) probability measures in a canonical way. Agueh and Car- 
lier |lj also considered convexity over Wasserstein barycenters, as a generaliza¬ 
tion of displacement convexity. This notion can be interpreted as an analogue 
of Jensen’s inequality; this point of view was investigated in SJ where geo¬ 
metric versions of Jensen’s inequality were established for displacement convex 
functionals on Wasserstein spaces over Riemannian manifolds, extending the 
Euclidean results of [I] . 

The displacement convexity of the variance should be contrasted with its 
behaviour with respect to linear interpolation of measures. When measures are 
interpolated linearly, it is easy to see that the variance is concave, regardless of 
the curvature of M. Combined with the ordinary Jensen’s inequality and our 
displacement convexity result, this implies that the variance of the Wasserstein 
barycenter of any measure fl on P(M) is less than or equal to the variance of 
its linear barycenter, if M is nonpositively curved simply connected space; see 
Corollary 14. II Although this statement is not explicitly linked to convexity and 
concavity, we are not aware of another proof which does not use convexity over 
Wasserstein barycenters. We present a counterexample demonstrating that this 
inequality can fail when the curvature conditions are relaxed. 

We then turn our attention to the special case when the measure Q is in¬ 
duced by a left invariant measure on an isometry group G acting on M, and 
relate our work to the W 2 projection P^Y (/z) of /z € P{M) to the set of G- 
invariant measures on M. Connections between optimal transport problems 
and measures which are invariant under certain operations have recently begun 
to attract considerable attention; see m m m m i, although these works are 
primarily concerned with finding Kantorovich solutions of the optimal transport 
problem with certain symmetry constraints, rather than looking at Wasserstein 
projections. Our work here implies a comparison result for the variance func¬ 
tional between the L 2 projection and the W 2 projection to the G -invariant set. 
Namely, when M is nonpositively curved and simply connected, we get, under 


3 



suitable conditions on /z, 


var (Pg(v)) < var(/x); 


see Corollary 15.31 Note that, at first glance, this inequality has no obvious 
connection to the barycenter of a family of measures, but we are not aware of 
another simple proof of it. Furthermore, it is interesting when contrasted with 
the inequality 

var (P<£ (/z)) > var(/z), 

for the L 2 projection Pq (/ x ) of p onto the G-invariant set; see (15.11) . 

The paper is organized as follows: In Section 2 we establish the equivalence, 
on complete Riemannian manifolds, between nonpositive sectional curvature, 
together with simple connectedness, and displacement convexity of the variance. 
In Section 3, we show that this displacement convexity extends to convexity 
over Wasserstein barycenters. Section 4 is devoted to the comparison of the 
behaviour of the variance functional between linear and Wasserstein barycenters. 
Finally, in Section 5, these results are applied to isometry group actions, yielding 
comparison results for the L 2 the Wi projections to the set of invariant measures. 


2 Displacement convexity of the variance and 
nonpositive sectional curvature 

Before stating the main theorem of this section, we develop some notation. A 

well known result of Brenier [3] and McCann m asserts that if the measure /x 

is absolutely continuous with respect to volume and both /x and v have finite 

variance, then there exists a unique minimizer 7 to the minimization problem 

and furthermore, 7 = (Id, F)#/z, where F : M —> M, is the unique 

mapping such that F#/z = v taking the form F(x) = exp x (—Vc/)(x)), where 

,2 

<fi : M —>■ R is a ^--convex function; that is, <f> takes the form 

,/ \ d 2 (x,y) 

<t>(x) = sup---</> (y) 

y£M & 

for some <fi c : M —> R. The displacement interpolant between /x and v is then 
the map [0,1] —> P(M) given by y t = ((1 — t)Id + tDu(x))#y. We note that 
this notion of displacement interpolation can be extended to non-absolutely 
continuous measures in P(M); for precise definitions, we refer the reader to the 
books Bldg. A functional F : P(M) — > R U 00 is called displacement convex if 
the function t > F(yt) is convex for every displacement interpolant fit- 
This section is then devoted to the proof of the following result: 

Theorem 2.1. Assume M is simply connected. Then M has nonpositive sec¬ 
tional curvature if and only if the variance functional is displacement convex. 

Proof. This follows from Theorem 12.21 and Corollary 12.81 below. □ 
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2.1 Displacement convexity of variance: necessary condi¬ 
tion 

In this subsection, a standard argument shows that if the variance is displace¬ 
ment convex, then the underlying Riemannian manifolds has to be simply con¬ 
nected and nonpositively curved. 

Theorem 2.2. Let M be a complete Riemannian manifold. Suppose that vari¬ 
ance is displacement convex, i.e. var(/i t ) < (1 — f)var(^o) + tvax(pi) for each 
displacement interpolation pt of probability measures on M. Then, M is simply 
connected and has nonpositive sectional curvature K < 0. 

Proof. We first tackle the simple connectedness. The proof is by contradiction; 
assume M is not simply connected. We claim that this implies that each point 
x has a nonempty cut locus. To see this, note that there are homotopically 
nontrivial loops from x to itself. Taking an arc-length minimizing sequence of 
such loops, and noting that each loop in the sequence remains in a compact 
subset of M, we can pass to a convergent subsequence and obtain a geodesic 
loop from x to itself. A cut locus point clearly exists along such a loop. 

By B Proposition 2.5], then, for any x £ M, there exists y £ M, and a 
small v £ T X M such that 

d 2 {exp x v,y) + d 2 (exp x (— v),y) - 2 d 2 (x,y) < 0 . ( 2 . 1 ) 

Now, take two measures po = ^[S y + 5 e xp x t;] and p\ = \[5 V + d e xp x (_„)]• The 
displacement interpolant at t = i is clearly p\t 2 = ^[S y + <y. Note that the 
variances of the doubly supported measures po, p\ and pi/ 2 are, respectively, 
^d 2 (exp x v,y) + ^d 2 (exp x —v,y) and 1 d 2 {x,y ). This contradicts the displace¬ 
ment convexity of the variance. We note that one could also use this to con¬ 
struct an example with absolutely continuous po and p\\ observe that weak-* 
density of absolutely continuous probability measures, the weak-* continuity of 
the variance functional and stability of the displacement interpolation (these 
latter two facts are straightforward to prove; see Lemmas (13.31) and (13.21) in the 
next section), combined with inequality ( 12 . 11 ) . we can find absolutely continuous 
measures po and p\, whose displacement interpolant pi/ 2 satisfies 

va,r(pi) + var (p 0 ) < 2 var (pi/ 2 ). 

This again violates the displacement convexity of the variance, yielding the 
desired contradiction and therefore establishing the simple connectedness of M. 

We now turn to the sectional curvature assertion. The proof is again by 
contradiction; assume a section E of a tangent space T X M has positive sectional 
curvature. Then, we can find, for some small e > 0, points Xo,xi,yo,yi with 
the following properties: 


= d(y 0 ,yi):=e 
> e for some t £ (0,1) 

< d 2 (x 0 ,y 1 ) + d 2 {x 1 ,y 0 ) 


d(x 0 ,x i) 

d(7o(0>7i(0) 
d 2 (x 0 ,y 0 ) + d 2 (x 1 ,y 1 ) 
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Here 70 (t) and 71 (t) are geodesics from xq to yo and 27 to yi, respectively. 
Now, consider optimal transport between the two measures yo = \ [<5 Xo + 8 Xl ] 
and yi = \[8 yo + <5 yi ]; the optimal plan clearly pairs Xo with y 0 and 27 with 
yi, and so the displacement interpolant at t is y t = i[(5 7o ( t ) + <5 71 o)]. Therefore, 
the variances of yo,yi and yi/ 2 are, respectively, \d 2 {x 0 , 2 , 7 ), \d 2 (yo,yi) and 
|gP( 7 oO), 7 i( 0 )> and so 
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(1 _ ^ d 2 (xp,xi) + f d 2 (y 0 ,yi) 

(1 — t) vai(yo) + t var(p,i). 

This contradicts displacement convexity; if one wants to consider absolutely con¬ 
tinuous measures instead, then, one can argue as before using an approximation 
argument. This contradiction yields the desired result. □ 

2.2 Displacement convexity of variance: sufficient condi¬ 
tion 

The goal of this subsection is to show that variance, as a functional on the 
space of probability measures, is displacement convex if the underlying domain 
or manifold M is simply connected and has nonpositive curvature; if our domain 
is not complete, then we further assume that it is geodesically convex. Here, 
by geodesic convexity of M, we mean that for any given two points in M, any 
minimizing geodesics connecting these two points remains in M. 

In fact, we prove convexity along a slightly more general family of paths 
than displacement interpolations or equivalently, WVgeodesics in P[M). All 
results in this section are obtained using standard calculations and results in 
Riemannian geometry. 

Definition 2.3 (HVquasi-geodesic). Let V be a measurable vector field de¬ 
fined a.e. on a Riemannian manifold M. Define for t S [0,1], a measurable 
mapping T t as T t (x) = exp 2 ,(tl/(a:)) for a.e. x. Then, for each absolutely con¬ 
tinuous probability measure y on M, we call the 1-parameter family yt = T t #y 
a W 2 -quasi-geodesic . 

Notice when V is given by V</> for some c-convex function cf> , (see m), w 2 
quasi-geodesics become W 2 geodesics. It is convenient at this point to observe 
a simple technical fact, which we won’t need in this section but will be used in 
the proof of Theorem 13.61 in the following section. 

Lemma 2.4 (A first variation along W 2 quasi-geodesic). Let y be an absolutely 
continuous probability measure on M and let yt = Tt#y be a W 2 quasi-geodsic, 
given by the measurable vector field V, Tt(x ) = exp x tV(x) for a.e. x. Let 7 (f) 


, ^ d 2 (7o(0>7i(i)) 

var {y t ) = --- > 
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be a differentiable curve in M such that 7 ( 0 ) is a barycenter of p. Then 


d_ 

dt 


W 2 {6 l(t) ,p(t)) = - 2 [ (exp ^ 1 (7(0)), V(x))dp{x) 

t=0 J M 


where (•, •) is the Riemannian inner product. Notice that exp ” 1 ( 7 ( 0 )) is defined 
whenever x is not in the cut-locus 0 / 7 ( 0 ), which is almost every x, and thus 
p-a.e. for the absolutely continuous measure p. 


Proof. First note that 

V x -d 2 (w,x) = — exp/ 1 (u') ( 2 . 2 ) 

for a.e. x. We then have 


d_ 

dt 


t=o 




IM 


dt 


t =0 


d 2 (l(t),T t (x))dp(x) 


= / (V, 

J M 


IM 


IM 


(v y 

r 

f v 

IM 

(v y 


«=7(0) 


y=T 0 (x ) 


d 2 (w , T 0 (a:)), j'(0))dp(x) 
d 2 h(0),y),To(x))dp(x) 
d 2 (w, To(x))dp(x), 7 7 ( 0 ) 


w=j(0) 

d 2 h(0),y)iTo(x))dp(x) 

y—To\x) 


Note that in the calculations both above and below, we use the absolute con¬ 
tinuity of the measure p so that the non differentiability points of the distance 
squared do not effect the calculations. Now, 



d 2 (w,T 0 (x))dp( x) = S7 W 

w= 7 ( 0 ) 


u=7(0) . 


v)dp(x) 


M 


vanishes because 7 ( 0 ) is a barycenter of p. The result now follows from (12.21) 
and the observation that Tq(x) = V(x). □ 

Note that convexity along Wf quasi-geodeiscs implies displacement convex¬ 
ity. Below we will show that the variance functional p 1 —> var (p) is convex along 
W -2 quasi-geodesics. 

We will need a simple consequence of the second variation formula of arc- 
length; the following Lemma is a special case of a result of Sturm jT 8 l Corollary 
2.5] and so we omit the proof. 

Lemma 2.5 (Convexity of distance squared for points along two geodesics). Let 
M be a simply connected manifold with nonpositive sectional curvature K < 0. 
Let z,u> : [0,1] —> M be two geodesics. Then t >->• d 2 (z(t),w(t)) is convex. 
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Remark 2.6. This results can be easily extended to the case when M is a 
geodesically convex domain in a complete Riemannian manifold with nonpositive 
curvature. 

Now, we prove the main theorem of this section: 

Theorem 2.7 (Convexity along W 2 quasi-geodesics). Let M be a geodesically 
convex domain in a complete simply connected manifold with nonpositive sec¬ 
tional curvature I\ < 0. Let pt, a < t < b, be a W 2 quasi-geodesic in P(M). 
Let t G [a, b] —>• w{t) G M be a geodesic. Then, W$(8 w ( t ), pt) Is convex in t. 

Proof. This is an easy corollary of Lemma 12.51 and Remark 12.61 The details 
follow. Note that pt = T t #p where for a.e. x, T t (x) = exp x tV(x) for some 
vector field V on M. 

We now observe 

W$(8 w ( t ), p t ) = ( d 2 (w(t), z)dpt(z) = ( d 2 (w(t),T t (x))dp 0 {x). 

Jm J m 

where the first equality is from the definition of W 2 distance and the second 
equality is from p t = T t #po- Therefore, 

^2 W 2 ^w{t),Pt) = ^ d 2 (w(t),T t (x))dp 0 {x). 

Now, note that for a fixed x, t G [a,b] —>• T t (x) is a geodesic, and so using 
Lemma 1751 and Remark 17711 we see ^ d 2 (w(t),T t (x )) > 0. Thus, 

^ 2^2 w{t)ipt ) > 0 . 

This completes the proof. □ 

From Theorem 1271 convexity of the variance follows immediately. The fol¬ 
lowing corollary, together with Theorem 12.21 establishes Theorem 12.II 

Corollary 2.8 (Convexity of variance along W 2 quasi-geodesics). Adopt the 
notation and assumptions of Theorem \2.T\ Then var (p t ) is convex in t. 

Proof. For each interval [a,/3] C [a, 6], choose a geodesic t G [o,/3] —» M with 
w(a), being the barycenter points of p a , pp, respectively. Apply Theo- 

rem !2.7l to get convexity of Wf{5 w ( t ), Pt), and note that var {p t ) < W 2 (8 w (t), Pt), 
with equality at a and /3. This establishes the convexity of var (p t ) in t. □ 

3 Convexity of the variance functional with re¬ 
spect to Wo barycenters. 

In this section, we use convexity along W 2 quasi-geodesics to prove a convexity 
result with respect to W 2 barycenters (see Theorem 13.61 below!: recall that W 2 
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barycenters were defined in m- Note that Theorem 13.61 requires no regularity 
(ie, absolute continuity) of the measures p in spt( fl). Under suitable regular¬ 
ity conditions on Q (see, for example, case 1 of the proof), the argument is a 
straightforward variant of the proof of a similar result (for different displace¬ 
ment convex functionals) in [gj. Much of the work in this section is related to 
the extension to singular measures (in which case the barycenter itself can be 
singular and non unique). 

Throughout this section, we will assume that M is a compact domain in a 
Riemannian manifold. Existence of a W 2 barycenter of a probability measure 
U on P(M) is easy to show. The W 2 barycenter is not generally unique; that 
is, there may be multiple minimizers in m- However, uniqueness is known 
under a mild structural condition on H: 

Proposition 3.1. Assume M is compact (e.g. a compact domain in a manifold) 
and that fl(P ac (M)) > 0. Then there exists a unique W 2 barycenter. 

The proof can be found in m and [8J. We will also need the following 
stability result. 

Lemma 3.2. Assume M is compact and suppose the probability measures U jV on 
P(M) converge in the weak-* topology, to Q (with respect to the \V 2 -distance on 
P(M)). Then the limit of any weakly-* convergent subsequence fi N of barycen¬ 
ters of the tt N is a barycenter of H. 

Proof. The proof is a standard argument. Choose a weakly convergent subse¬ 
quence, p, N —> p. For any p £ P(M), we have 


wi(p,p.) < (w 2 (p,-p N ) + w 2 (-p N ,-p)Y 

= wf (m, fi N ) + w|(/A p) + 2 w 2 (p, fL N )w 2 (p N , P) 


(3.1) 


Integrating against H iv , we have 



wi(p,p)dn N (p) 


Therefore, for any v £ P(M), 


< f P(M) Wi(p,p N )dn N (p) + Wi(jl N ,fl) (3.2) 
+2 W 2 (p n ,p)J p{m) W 2 (p,fi N )dn N (p) 

we have, by definition of the barycenter fi N , 



wi(p,p)dn N (p)< I wi(p,n)dn N (p) + wi(p N ,p) 

w 2 (p,p N )dn N (p) 


IP(M) 

2W 2 (p N ,p) 


I P(M) 


(3.3) 


Now, as weak-* convergence is equivalent to Wasserstein convergence, W 2 (fi N , fi) 
tends to zero as N —> 00 , and as the term W 2 (p,p N ) is uniformly bounded by 
the compactness of M, the last two terms on the right hand side go to zero. 
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By weak convergence of the fl N , and continuity of p M- W 2 (p, p) and p e-» 
W 2 {p, v), the above inequality tends to 



W 2 (p,p)dCl(p) < 



(3.4) 


As v is arbitrary, this completes the proof. 


□ 


Another standard argument shows: 

Lemma 3.3. Assume M is compact. The mapping p H > var(/i) is continuous 
on P(M) with respect to the weak-* topology. 

Proof. Suppose p N —> p in the weak-* topology. It is easy (in fact, almost 
identical to the proof in the preceding Lemma) to show that any convergent 
subsequence x N of barycenters of the p N converges to a barycenter x of p. We 
then need to show f M d 2 (y,x N )dp N (y) —>• f M d 2 (y, x)dp(y). We have 


I / d 2 (y,x N )dp N (y) - [ d 2 (y,x)dp(y)\ 

JM JM 

<l/ d 2 {y,x N )dp N (y)- f d 2 {y,x)dp N {y)\ 

JM JM 

+ l/ d 2 (y,x)dp N (y) - [ d 2 (y,x)dp{y)\ 

JM JM 

As N —>• 00 , the second term in the right hand side above goes to zero by weak 
convergence. The first term can be written as 

I [ [d(y,x N ) + d(y,x)}[d(y,x N )-d(y,x)]dp N (y)\ 

JM 

< [ \[d(y,x N ) + d(y,x)][d(y,x N ) - d(y,x)]\dp N (y) 

JM 

< 2diam(M) f \d(y,x N ) - d(y,x)\dp N (y)\ 

JM 

< 2diam(M) f d(x, x N )dp N (y) 

JM 

= 2diam(M)d(®, x N ) 

The result follows. □ 

Corollary 3.4. Assume M is compact. Suppose the measures Q N on P(M) 
converge weakly to Q. Then 

/ va,r(p)dfl N (p) —*■ / vai(p)dfl(p) 

Jp(M) Jp(M) 

Proof. This is an immediate consequence of the continuity of p 1 —> var (p) 
tLemma 13.31) and the definition of weak convergence. □ 
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Before we prove the main theorem of this section, we make the following 
observation: 

Lemma 3.5. Suppose y is a barycenter of the measure fl on P(M). Then y is 


the unique barycenter of ^Sp, + ^fl. 

Proof. For any u, we have 

\w%{v,y) + )- f W%{y,v)dfl(y) > 

z z JP(M ) 


n) + \ [ w$(y,p)dn(n) 

A 1 J P(M) 

l [ w%(y,y)dfi(y), 

z Jp(M) 


with equality if only if v = fl. □ 

Now we state and prove the main theorem of this section. 

Theorem 3.6 (Convexity of variance with respect to the barycenter). Assume 
that M is a compact, geodesically convex domain in a complete nonpositively 
curved manifold: thus, y i— > var (y) is convex along W 2 quasi-geodeiscs. Let fl 
be a Borel probability measure on P(M). Let ft £ P(M) be a W 2 barycenter of 
Q. Then, we have 


var (y) < / var (fj,)dfl(n). 

Jp(M) 

Proof. The proof is divided into three, successively more general cases. 

Case 1: The measure fl = )Ci=i ^ as finite support and one of the ni 
is absolutely continuous with respect to volume. 

In this case, from the result of [15], the barycenter fi is unique and absolutely 
continuous with respect to the volume measure (this also holds without the 
curvature assumption We will need to set up some relevant notation. Let 
T* be the optimal map from fl to p l ; by the Brenier-McCann theorem (see [TO]), 
for a.e. x, T*( x) = exp^, 'S7(f> l {x) for some d ?/2 convex function <p l . Moreover, 
by a straightforward adaptation of a result of Agueh-Carlier [I] (see also 0 for 
more general cases), we have 

N 

^ \i\74> l (x) = 0 for a.e. x £ spt(fl). (3.5) 

i—\ 

For each i, let w 1 be a barycenter of yd] that is, Wf(5 w i, yd) = var (yd). Let w 
be a barycenter of ft. 

Let y\ = Tf^fi = exp x tS7(j) l (x)#f be the corresponding displacement inter¬ 
polations (which are of course W 2 quasi-geodesics); note that yd 0 = f, y\ = yd. 
Consider the geodesic t £ [0,1] >->• 7 l (t) £ M with 7 l (0) = w, 7 *( 1 ) = w l , and 
the function 
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N 

t ^ $(t) = ^2 XiWi(Sy {t) ,ni). 

2=1 

Use (13.51) to compute 


d_ 

dt 


N 


i =1 

N r 

= y^A,;( —2 / (exp” 1 w, V4> l (x))dp,(x)) ffrom Lemma 12.41) 
2=1 J m 

f N 

= —2 / (exp^, 1 w, ^ \iV(j) l (x))d[i(x) 

JM 


= 0 (from (13.51) ') 


(3.6) 


Note that <f>(f) is convex, since from Theorem 12.71 W% (<5^»( t ), /T(i)) is convex in 
t. Combined with (13.61) . convexity of $ implies 

$(0) < $(f). 

But, notice that 

N 

$ (°) = ^2 = var(/2), 

2=1 

N . . f 

<F(1) = AjWf(u; 1 , /x®) = / var(/x)dfl(/r). 

2=1 -Mm) 


This establishes the result in the first case. 

Case 2: Next, we consider the case when U has a unique barycenter. 

Noting that Wasserstein space P(M) over M is a Polish space, we can choose 
a sequence Cl N = YliLi °f finitely supported measures on P(M) converging 
weakly-* to f2, by [221 Theorem 6.18]. For each N, we can also choose at least one 
of the Hi to be absolutely continuous with respect to volume, by weak-* density 
of absolutely continuous measures on M. By Lemma 13.21 and uniqueness, we 
know that the barycenters p, N of Cl N converge weakly to the barycenter jl of Cl. 
Now, by the above 


< f var (n)dCl N (h)- 

Jp(m ) 

Now, take the limit as N —» oo. The right hand side tends to / P ( M ) var(/i)dfl(/i) 
by Corollary 13.41 The left hand side tends to var(/2) by Lemma 13.31 This 
completes the proof in the case when the barycenter is unique. 

Case 3: Finally, we consider the general case. 
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Let /j be a (not necessarily unique) barycenter of Cl. By case 2 and Lemma 
13.51 we have 

11 f 

var(/r) < - var(p) + - / var(/z)dfi(/x) 

2 2 

which easily implies the desired result. □ 

Note that the above theorem does not hold if the curvature assumption is 
removed, as the following example illustrates: 

Example 3.7 (Sphere). Let M be the 2-dimensional round sphere of circum¬ 
ference 2, i.e. the Riemannian distance from the north to south pole is 1. 
Then, consider the two measures /ro = 5 n , pi = S s , where n and s denote 
the north and south pole, respectively. Let f l = ^( 8f, 0 + S Pl ) on P(M ) and 
note that f P , M \ var(/i)dfi(/i) = ivar(<5„) + ivar(J s ) = 0. There are infinitely 
many W 2 ~barycenters; namely, any probability measure supported on the equa¬ 
tor is a W 2 ~barycenter of Cl. In particular, S z is a W 2 ~barycenter for any z 
in the equator, which has vanishing variance. This does not violate the in¬ 
equality of Theorem 1 3. 61 However, uniform measure p on the equator is also a 
barycenter. Then note that va.r(p) = J d 2 (t,n)d/x(t) for the north pole n, and 
so var(p) = 1/4 > 0 = / p(M) var (p)dCl(^). 

We close this section by noting a consequence of Theorem 13.61 which will be 
relevant in the next section. 

Corollary 3.8 (Variance gets reduced at the barycenter of an orbit of isome¬ 
tries). Let G be a set of isometries on a complete simply connected manifold 
M of nonnegative curvature. Let LI be a probability measure on G. Consider a 
probability measure ;1 on M and assume that there exists a large geodesic ball 
that contains the union of the supports of the measures g#p for all g £ G. Let 
pn be a W 2 barycenter of the measure (g i->- g#/T)#Ll on P{M). 

Then, 


var(po) < var(/r). 

Proof. The corollary immediately follows from Theorem 13.61 since var (g#n) = 
var(/r) for each isometry g. Note that under the nonnegative curvature and 
simply connected assumption, each geodesic ball is geodesically convex. □ 

Remark 3.9. Although we do not pursue it here, by assuming some decay 
conditions on the measure f l on G, as well as considering the space P 2 (M) of 
measures with finite variance, one may extend the above results to non compact 
cases, in particular, to include isometry group actions on the whole Euclidean 
space or the hyperbolic space. 
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4 Comparison with linear interpolation 

In this section, we obtain, as a corollary to Theorem 13.61 a comparison result for 
the variance functional between the linear barycenters and the W 2 barycenters. 

We first consider the linear interpolation between probability measures, = 
(1 — t)no + tp 1 . We then have that 

1 1 ^ var(/xt) = min / d 2 (x,y)dpt(x) 
v Jm 

is an infimum of affine functions, and hence concave. Define BC L (Q) £ P(M) 
to be the linear barycenter of the measure D on P(M); that is, for each Borel 
A C M, 


BC L (Sl)[A] := f 1 i(A)dQ.(p). 

JP(M) 

Then, the (linear) concavity of the variance and the classical (linear) Jensen’s 
inequality implies 


var (BC l (VL)) > f var(/r)dfl(/i). (4.1) 

Jp(M ) 

Note that this holds for any Riemannian manifold M; the sectional curvature 
does not play a role. On the other hand, we have: 

Corollary 4.1. Let M be a compact geodesically convex domain in a com¬ 
plete manifolds of nonpositive sectional curvature, Q be a probability measure 
on P(M), and p, be its W 2 ~barycenter. Then, 

var(/u) < var (BC l {TL)). 

Proof. This immediately follows from the preceding inequality combined with 
Theorem 13.61 □ 

This inequality seems quite intuitive to us. Consider the following, naive 
explanation. For simplicity, focus on the interpolation between two measures; 
displacement interpolation (the two measure case of W 2 barycenters) moves 
the support of one measure to the other continuously along geodesics, so that 
the support of the interpolant should not be much more spread out than the 
supports of the two original measures. On the other hand, the support of the 
linear interpolant is the union of the supports of the two original measures, 
and so we expect it to be more spread out (i.e, have higher variance) than the 
displacement interpolant. However, this intuition is somewhat misleading, as it 
does not require any assumptions on the curvature; as the following example 
demonstrates, the nonpositive curvature condition in Corollary 14. II is essential. 

Example 4.2 (Balloon on a string). Consider the sphere S 2 of circumference 
1 (the ‘‘balloon'’), so the distance between the north and south poles is ^, with 
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a line segment of length 1 (the “string”) attached to the south pole. Let x he 
the north pole, and set xo = exp x v, X\ = exp a ,(— v) for some v £ T X M with 
|i>| = e < j (that is, Xq and X\ are found at the same distance from the north 
pole, along opposite directions.) Let y be the point on the line segment at a 
distance i — e from the south pole. 

Now, set po = ^[S y + 8 Xo \ and p\ = ^[6 y + 5 Xl ]. Note that the south pole 
is the \V 2 -barycenter of both of these measures, and they each have variance 
(i — e) 2 . It is then easy to see that the south pole is also the barycenter of the 
linear interpolation: p^ 2 := \[po + P 1 ] and that var (Pi/ 2 ) = (\ — e) 2 as well. 
On the other hand, the displacement interpolant is given by p^/ 2 := + <5 X ] 

(recalling that x is the north pole) whose variance is given by 

var (/ i i/ 2 ) = d ^ 4 £ ' > > y ar (Pt). 

Although the metric space in the example is not a smooth manifold, it can 
easily be smoothed out to construct smooth examples where the preceding vari¬ 
ance inequality holds. 


5 W ‘2 Projection to the G-invariance set 

In this section, we consider isometry group actions on the underlying space 
M , which also induce isometry group actions on P(M). We are interested in 
the Wasserstein WVbarycenter of the orbit of the group action, in relation to 
functionals on P(M). Our focus in this section is on the variance functional 
for nonpositively curved underlying space, so that we can use the results of 
the preceding sections. But, the similar results hold for other examples; see 
Remark 15.41 and Example 15.51 

We begin by showing that the projection onto the invariance set conincides 
with the barycenter of the orbit under left Haar measure. 

Proposition 5.1 (Projection to G-invariance set). Let G be a group of isome¬ 
tries on a Riemannian manifold M and H be a left invariant probability measure 
on G (here, the group G has to be compact). For a given probability measure 
/i £ P(M), assume that the barycenter BCq / ( p.) of = (p 1 —>■ g#p)#H is 
unique. Define the G-invariant set Iq = {v € P(M)\g#v = v, \tg £ G}. 
Then, 

1 . BCq ’ (p) £ I G . 

2. BC)q (p) is the unique W 2 projection of p to Iq; that is {BCq 7 (/Li)} = 
argmin^g^ Wf{y,p), or, using the notation in the introduction, 

BC%(p) = pW(p). 

This should be a well-known standard fact from metric geometry, but, we 
give its proof for completeness. 
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Remark 5.2. The uniqueness condition on the barycenter is satisfied when p 
is absolutely continuous with respect to volume (as then each g#p is clearly 
absolutely continuous as well) by Provosition \3. 1\ 

Proof. For simplicity of notation, we denote ft = BCq' (p). We prove the two 
assertions below: 

1. For each g,g' £ G, by the isometry property, we have W 2 (g#p, ggy.p) = 
W 2 (p,g' # p) Thus, 


/ W^{p,g' # p)dCl(g') = / W%(g#p,gg#p)dS}(g') 

Jg Jg 

= / gg'jiT)dQ(gg') (as f l is left invariant ) 

Jg 


This implies that g#p is a barycenter of SI; by the uniqueness assumption, this 
shows ft = g#p. As this holds for each g £ G, we have p £ Ig- 

2. Notice that if v £ Iq, then, W 2 [y,g^p) = W 2 [y,p) for all g £ G. Thus, 


W 2 (v,p) = [ W%(v,p)dQ.(g) 

Jg 

= I Wf (v, g^p)dfl(g) 

Jg 

> / W^ip, g#g)dtt(g) (from the definition of ft) 

Jg 

= / W%(p, g)dtt(g) (since ft £ Ig from 1. ) 

Jg 

= W$(p,p) 


This shows that p is the minimum of {Wf(v, p)} v ^i G . Noting that the inequality 
is strict unless v = ft, by the uniqueness of the barycenter, completes the proof. 

□ 


An interesting consequence follows: 

Corollary 5.3 (Variance gets reduced at the W 2 projection to the invariant 
set). Under the same notation as in Provosition \5 . 11 assume further that M is 
a complete, simply connected nonpositively curved manifold. For each absolutely 
continuous probability measure p with compact support, 

var (P)¥p) < var (p). 

Proof. This immediately follows from Corollary 13. 81 and Proposition 15.II □ 

If p is absolutely continuous, with a density / in L 2 , it is straightforward to 
see that the linear barycenter BCq(p) of is absolutely continuous as well, 
and that it’s density is given by 

f(x) = [ f{g~ 1 {x))dH(g). 

Jg 
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This / is the L 2 minimizer of the functional 

I \\h-f og~ 1 \\ 2 dH{g) 

JG 

and it also coincides with the L 2 projection Pq (/i) of /i onto the subspace 
of L 2 functions which are invariant under the action of G. From dUD and 
Theorem 15.31 we have 

var (Pail*)) < var(/z) < var {Pg (g)). (5.1) 

In particular, if we consider the case G = SO(n ) acting on the Euclidean 
space 1" (or any rotationally symmetric nonpostivley curved metric on R", e.g. 
the hyperbolic metric), projecting onto the invariant set can be interpreted as 
finding the best rotationally invariant approximation of g. Finding the best ap¬ 
proximation of a measure /i, in the Wasserstein sense, by a radially symmetric 
measure decreases the variance. On the other hand, finding the best approxi¬ 
mation of a measure by a radially symmetric measure in the L 2 sense increases 
the variance. 

Remark 5.4. It is worth noting that Corollarv \5.A holds whenever the variance 
is replaced with any functional F which is convex over W 2 ~barycenters (includ¬ 
ing the three main types discovered in m whose convexity over W% barycenters 
are obtained in m on the Euclidean space and in m on Riemannian manifolds 
with nonnegative Ricci curvature, and more generally on smooth metric measure 
spaces satisfying the CD{K,N ) condition for K > 0), provided the functionals 
are invariant under an isometry group G; that is, provided F(A#g) = g for 
all g and all A E G. We feel the variance case on Hadamard manifolds (an¬ 
other name for complete, simply connected nonpositively curved manifolds) is 
of special interest as the opposite holds true for linear barycenters. Linear and 
Wasserstein projections give two ways to canonically generate a G-invariant 
measure from a given measure; one of these decreases the variance while the 
other increases it. 

Example 5.5. As an illustrative example, consider the entropy functional, 
F(fdv ol) = f M f{x) ln(/(x))dvol(a:) on a manifold with nonnegative Ricci cur¬ 
vature. It is well known that F is displacement convex; our recent work fSf 
extends this result to show that F is in fact convex over barycenters. For any 
absolutely continuous measure g and any compact group of isometries G on M, 
we then get 


F{BC%{g))<F{g). (5.2) 

In the particular case when M = S n is the round sphere and G is the whole 
rotation group, the barycenter BCq r (g) = vol/vo 1(5") must be uniform mea¬ 
sure, as this is the only probability measure on S n which is invariant under this 
group. It is well known that uniform measure minimizes the entropy, so this is 
consistent with & For smaller rotation groups G, symmetrizing with respect 
to G (that is, projecting onto the G-invariant set) reduces the entropy, by (15.21) . 
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